Skip to content

fix(sf): dry-run the ReportCard + Director advisory tail on the Friday preflight (ROADMAP L4504)#372

Merged
cipher813 merged 1 commit into
mainfrom
fix/director-shell-run-guard-L4504
Jun 5, 2026
Merged

fix(sf): dry-run the ReportCard + Director advisory tail on the Friday preflight (ROADMAP L4504)#372
cipher813 merged 1 commit into
mainfrom
fix/director-shell-run-guard-L4504

Conversation

@cipher813
Copy link
Copy Markdown
Owner

@cipher813 cipher813 commented Jun 5, 2026

Closes the L4504 pre-flip blocker (config ROADMAP, #477 merged). SF half; companion handler PR: alpha-engine-evaluator #23.

Revised from the earlier hard-skip approach to dry-execution per review — the Friday preflight should exercise the advisory Lambdas' bootstrap/import/IAM paths, not skip them (the shell-run keystone deliberately removed all skip-exceptions).

Problem

The Friday preflight (shell_run=true) dry-executes the Saturday SF. ReportCard + Director were added after the keystone with no dry path (payload only {date}; the Director Lambda gates solely on DIRECTOR_ENABLED). Once DIRECTOR_ENABLED is flipped on, a preflight would run ReportCard for real over backtest/{Fri-date}/* the dry workload never wrote → degenerate card → fire a real Opus Director call that merges that plan into the SHARED, non-date-scoped carry-over ledger director/carryover_ledger.json, polluting the state the real Saturday run reads. Correctness bug, not just wasted cost.

Fix

Thread "dry_run.$": "$.research_dry" into both payloads — the canonical shell-run-dry signal (seeded false in InitializeInput, flipped true in ApplyShellRunDefaults), exactly how the eval-judge / rationale-clustering / replay-concordance / counterfactual advisory Lambdas already run dry. The handlers (evaluator #23) then:

  • ReportCard dry → no-write (still boots/imports/reads/computes every tile).
  • Director dry → no-Opus / no-write probe: constructs the real client (validates the langchain-anthropic import + the SSM key-fetch IAM grant), reads the ledger, stops short of .invoke(), mutates nothing.

WaitForWeeklySubstrateHealthCheck → ReportCard → Director wiring is unchanged; on a preflight both states run dry, they are not skipped.

Tests

  • Updated the 2 tests pinning the WaitForWeeklySubstrateHealthCheck → ReportCard edge.
  • Added test_advisory_tail_runs_dry_on_preflight (asserts dry_run.$=$.research_dry on both payloads).
  • Updated the _SATURDAY_PAYLOAD_KEYS registry for the new key.
  • 497 SF-structure tests pass.

Deploy note

SF auto-deploys on merge to main; the merged data #371 (--definition file://) fix clears the prior ARG_MAX limit. Both this + evaluator #23 should land together (the SF references the handler dry_run contract).

🤖 Generated with Claude Code

The Friday-PM Preflight Pipeline (shell_run=true) dry-executes the Saturday
SF to exercise bootstrap paths, but ReportCard + Director have no dry path
(their payload is only {date}; the Director Lambda gates solely on
DIRECTOR_ENABLED). Left ungated, once DIRECTOR_ENABLED is flipped on a Friday
preflight would run ReportCard for real over backtest/{Fri-date}/* the dry
workload never wrote (a degenerate, mostly-N/A card) and fire a real Opus
Director call that merges that plan into the SHARED, non-date-scoped carry-over
ledger (director/carryover_ledger.json) — polluting the state the real Saturday
run reads. Correctness bug, not just wasted cost.

Adds a CheckShellRunSkipDirector Choice between WaitForWeeklySubstrateHealthCheck
and ReportCard: on shell_run=true it hard-skips BOTH advisory states straight to
the shell-run-aware notify gate; shell_run absent/false → Default → ReportCard,
byte-identical to the pre-guard real Saturday run. The preflight's purpose is
bootstrap exercise and both advisory Lambdas already have their own canaries.

Unblocks the Phase-F DIRECTOR_ENABLED=true flip. Tests updated +
test_shell_run_skips_advisory_tail added (497 SF-structure tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cipher813 cipher813 merged commit d000812 into main Jun 5, 2026
1 check passed
@cipher813 cipher813 deleted the fix/director-shell-run-guard-L4504 branch June 5, 2026 13:14
@cipher813 cipher813 changed the title fix(sf): shell-run/preflight guard for the advisory tail (ROADMAP L4504) — unblocks DIRECTOR_ENABLED flip fix(sf): dry-run the ReportCard + Director advisory tail on the Friday preflight (ROADMAP L4504) Jun 5, 2026
@cipher813
Copy link
Copy Markdown
Owner Author

Note for the record: this PR merged the hard-skip CheckShellRunSkipDirector Choice-gate version. The title was edited mid-flight to the dry-execution wording before I'd pushed the rework, so it doesn't match the merged code. Follow-up #373 converts the hard-skip to the chosen dry-execution approach (companion handler PR alpha-engine-evaluator #23). Interim state is safe — hard-skip also prevents the ledger pollution.

cipher813 added a commit that referenced this pull request Jun 5, 2026
#373)

Follow-up to #372 (merged), which shipped the hard-skip CheckShellRunSkipDirector
Choice gate. Per the chosen approach, convert it to keystone-consistent dry-
execution so the Friday preflight still exercises the ReportCard + Director
Lambda bootstrap/import/IAM/transport paths instead of skipping them (the shell-
run keystone deliberately removed all skip-exceptions).

- Remove the CheckShellRunSkipDirector Choice; restore the direct
  WaitForWeeklySubstrateHealthCheck -> ReportCard edge.
- Thread dry_run.$=$.research_dry into the ReportCard + Director payloads (the
  canonical shell-run-dry signal; false on the real Saturday run / true on the
  preflight), mirroring the eval-judge / rationale-clustering / replay-concordance
  / counterfactual advisory Lambdas.

Handlers (companion alpha-engine-evaluator #23): ReportCard dry -> no-write
(still boots/imports/reads/computes); Director dry -> no-Opus / no-write probe
(constructs the real client to validate the langchain import + SSM key-fetch IAM
grant, reads the ledger, stops short of .invoke(), mutates nothing — the shared
non-date-scoped carry-over ledger is never polluted).

Tests updated to expect the dry-execution wiring + test_advisory_tail_runs_dry_on_preflight
+ payload-key registry. 497 SF-structure tests pass.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant